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ABSTRACT 

In  this  research,  we  developed  a  new  search  direction  in  the  conjugate  gradient  algorithms  by  using  combined 
convex  property.  The  developed  algorithm  becomes  converged  by  assuming  some  hypothesis.  The  numerical  results  show 
the  efficiency  of  the  developed  method  for  solving  test  unconstrained  nonlinear  optimization  problems. 
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1.  INTRODUCTION 

Conjugate  gradient  methods  represent  an  important  class  of  unconstrained  optimization  algorithms  with 
strong  local  and  global  convergence  properties  and  modest  memory  requirements.  An  excellent  survey  of  the 
development  of  different  versions  of  nonlinear  conjugate-gradient  methods,  with  special  attention  to  global 
convergence  properties,  is  presented  by  Hager  and  Zhang  [8].  This  family  of  algorithms  includes  a  lot  of  variants, 
well  known  in  the  literature,  with  important  convergence  properties  and  numerical  efficiency.  The  purpose  of  this 
chapter  is  to  present  these  algorithms  as  well  as  their  performances  to  solve  a  large  variety  of  large-scale 
unconstrained  optimization  problems.  For  solving  the  nonlinear  unconstrained  optimization  problem 

min{/’(x),  xeR"}  (1) 

Where  f  R"  — >  R  is  continuously  differentiable  function  and  bounded  below,  starting  from  an 
initial  guess  X  £  fi"  a  nonlinear  conjugate-gradient  method  generates  a  sequence  {xA }  as: 

xk+l —xk +(Xkdk  (2) 

Where  CCk  >  0  is  obtained  by  line  search,  and  the  directions  dk  are  generated  as: 

dk+]  =~gk+]  +/3ksk  ,  d,=-gl  (3) 

In  (3)  jBk  is  known  as  the  conjugate  gradient  parameter,  S k  =  Xk+[  —  Xk  and  g  j  =  V/  (  A;,  )  . 

Consider ||. ||  the  Euclidean  norm  and  define  y k  —  g  k+l  —  gk.  The  line  search  in  the  conjugate-gradient 
algorithms  is  often 
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based  on  the  standard  Wolfe  conditions: 

f(xk+  akdk  )-f(xk)<  +pak gTkdk  (4) 

STk+]dk>agTkdk  (5) 

Where  dk  is  a  descent  direction  and  0  <  p  <  <7  <  1 .  For  some  conjugate-gradient  algorithms,  stronger  versions 
of  the  Wolfe  conditions  are  needed  to  ensure  convergence  and  to  enhance  stability.  According  to  the  formula  for/lj. 

computation,  the  conjugate-gradient  algorithms  can  be  classified  as  classical,  hybrid,  scaled,  modified  and  parametric. 
In  the  following,  we  shall  present  these  algorithms  and  insist  on  their  numerical  Dolan  and  More’s  performances  profiles 
for  solving  large-scale  unconstrained  optimization  problems.  The  history  of  conjugate-gradient  method  begins  with  the 
seminal  paper  of  Hestenes  and  Stiefel  [9],  who  presented  an  algorithm  for  solving  symmetric,  positivedefinite  linear 
algebraic  systems.  In  1964  Fletcher  and  Reeves  [6]  extended  the  domain  of  application  of  conjugate-gradient  method  to 
nonlinear  problems,  thus  starting  the  nonlinear  conjugate-gradient  research  direction.  The  main  advantages  of  the 
conjugate-gradient  method  are  its  low  memory  requirements  and  its  convergence  speed.  A  large  variety  of  nonlinear 
conjugate  gradient  algorithms  are  known.  For  each  of  them,  convergence  results  have  been  proved  in  mild  conditions 
which  refer  to  the  Lipschitz  and  boundedness  assumptions.  To  prove  the  global  convergence  of  nonlinear  conjugate- 
gradient  methods,  often  the  Zoutendijk  condition  is  used  combined  with  analysis  showing  that  the  sufficient  descent 

condition  gTkdk  —  ~C  \\g  k  |  holds,  and  that  there  exists  a  constant  8  such  that  ||r//(  |  <  8k .  Often,  the  convergence 

analysis  of  conjugate  gradient  algorithms,  for  general  nonlinear  functions,  follows  insights  developed  by  Gilbert  and 
Nocedal  [7].  The  idea  is  to  bound  the  change  Uk+l  ~Uk  in  the  normalized  direction  Uk  =dk  / 1| dk  || ,  which  is  used  to 
conclude,  by  contradiction,  that  the  gradients  cannot  be  bounded  away  from  zero. 

2.  CLASSICAL  CONJUGATE  GRADIENT  ALGORITHMS 

These  algorithms  are  defined  by  (2)  and  (3),  where  the  parameter  /3k  is  computed  as  in  Table  (1).  Observe  that 
these  algorithms  can  be  classified  as  algorithms  with  |g£+1||  in  the  numerator  of  (3k  and  algorithms  with  g k+ly  k  in  the 
numerator  of  parameter  fdk  The  FR,  CD  and  DY  methods  (see  the  tables  for  the  definitions  of  the  acronyms  used  for  the 

algorithms  throughout  the  text),  with  ||^i.+1||  in  the  numerator  of  f3k  have  strong  convergence  theory,  but  all  performance 
profiles  of  conjugate  gradient  algorithms  for  unconstrained  optimization. 
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Table  1:  Classical  Conjugate  Gradient  Algorithms  [1] 


No. 

(Formula) 

(Author) 

1. 

T 

nHS  _  Sk^k 
'  k  jT 

dkyk 

(Hestenes-Stiefel  (HS),  1952) 

2. 

T 

oFR  _ 

Pk  ~  T 

8  k  8  k 

(Fletcher-Reeves  (FR),  1964) 

3. 

T 

OPR  _  8  k+\J  k 

Pk  ~  t 

8k  8k 

(Polak-  Ribiere  (PR),  1969) 

4. 

T 

QDX  S  k+lS  k+l 

Pk  ~  jT 

dk  8  k 

(Dixon  (DX),  1975) 

5. 

T 

oCD  8  k+l  8  k+l 

Pk  ~  ,T 

dk8k 

(Fletcher  (CD),  1987) 

6. 

T 

ols  8k+\yk 

Pk  ~  JT 

dk  8k 

(Liu-Storey  (LS),  1991) 

7. 

T 

oDY  _  gk+l8k+l 

'  k  jj 

dkyk 

(Dai- Yuan  (DY),  1999) 

These  methods  are  susceptible  to  jamming.  They  begin  to  take  small  steps  without  making  any  significant 
progress  to  the  minimum.  On  the  other  hand,  the  HS,  PRP  and  LS  methods,  with  g  k  +]  y  in  the  numerator  of  parameter 

1 3k  ,  have  a  built-in  restart  feature  that  addresses  the  jamming  phenomenon.  When  the  step  s k  is  small,  the  factor 
y ,  —  g .  j  —  g k  in  the  numerator  of  j3k  tends  to  zero.  Therefore,  f3k  becomes  small  and  the  new  direction  dk  ( ,  in  (3) 
is  essentially  the  steepest  descent  direction—  g  k+l  ■  With  other  words,  the  HS,  PRP  and  LS  methods  automatically  adjust 
(5k  to  avoid  jamming,  and  their  performances  are  better  than  the  performance 
of  methods  with  |g£+1||  in  the  numerator  of  /3k  . 

This  paper  is  organized  as  following.  In  section  3,newalgorithms  for  CG  to  solve  Unconstrained  Optimization 
Problems.  In  section,  4  we  will  show  that  our  algorithm  satisfies  descent  condition  for  every  iteration.  Section  5, we  will 
show  that  our  algorithm  satisfies  Global  convergence  condition  for  every  iteration.  Section6,presents  numerical 
experiments  and  comparisons. 

3.  NEW  ALGORITHM  FOR  CG  TO  SOLVE  UNCONSTRAINED  OPTIMIZATION  PROBLEMS 


We  candefined  the  new  search  direction  by  the  following: 

dk+l=-0kgk+l  +  (l-0k)pjk 


6- 


T  J 

h  ^ 


ykdk 


(6) 

(7) 
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Where  O<0<1 

New  Algorithm 

Step  1:  Initialization.Select  X  |  E  R  and  the  parameters  e>  o. 

Compute  f  (ij)  and  g j .  Consider  d ^  —  ~g\  and  set  the  initial 
guessQj[  =  i/||gi||. 

Step  2:  Test  for  continuation  of  iterations.  If  ||g  k  +]  ||  <  1 0  6 ,  then  stop. 

Step  3:  Line  search.  Compute  flf  >  0  satisfying  the  Wolfe  line  search 
condition  (4)  and  (5)  and  update  the  variables X  k+l  =  X  k  +  GCk d k  ■ 

Step  4:  Direction  new  computation,  compute  ,  n  ,,  ,,  .  p  ,  n  A If  the  restart  criterion 

d  -  8 t+i  +  < 1  “  d  >Pltlak  ’  u  =  — 71 — 

ykak 

T  2 

8 k+\8 k  —0.2  8  k+\  >  is  satisfied,  then  set  d—~gk+^  Otherwise  define  d/c+l=d.  Compute  the  initial  step 

size  Otfi  tyc-l  nt-t  L  Hfc  ,  set  k  —  k  +1  and  continue  with  step2. 

4.  THE  DESCENT  PROPERTY  OF  THE  NEW  METHOD 

Below  we  have  to  show  the  descent  property  for  our  proposed  new  Scaled  conjugate  gradient  algorithm,  denoted 
by: 

In  the  following  theorem  (1 ). 

Theorem  (1) 

The  search  direction  defined  by 

<4+,  +(i-0,)flA  <«> 


Satisfies  descent  property  for  all  k  ^  1 

Proof 


The  proof  is  by  induction. 

Ifk=l  then  dfgl  - -fgj  <  0  • 

Assume  that  gTkdk  <0 

Note  that  from  the  wolfe  conditionswe  have  dk  y  k  >  0  • 
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We  prove  that  the  relation  is  true  when  k  =  k  +  1  by  multiplying  the  equation  (9)  to  g  ^  we  obtain 


dL  8  k +i  =  “A  8  l+lg  k  +i  +  (1  -  A )  A  +i£  Uk 


\glJk\ 


1 8lJk  | 


di+lgk+1=-'-°^\\gkj +(i-^p)A+^rA 

ytdk  y\dk 


dLgk+i 

\glA 


g  k+\d k  I  I  II 2,0  T  l  \gk+\d-k 


TJ  ||&ifc+l||  /^k+lg  k+\d k  TJ 

y  k  a  k  y  k  ®  k 


fik+lgTkJk 


Is  I A 


Let  I6  <■'+'  till  f>R  „t  d  _  \o  |  _  o  t  j 

T  J  ||o*+l||  ^  Hk+lO  k+r^k  TJ  r'k+lO  k+l^lc 

ykak  ykak 

dTk+lgk+ 1<° 

5.  GLOBAL  CONVERGENCE  ANALYSIS 


(10) 

(U) 

(12) 


(13) 


Next  we  will  show  that  CG  method  with  A/  +1  converges  globally.  We  need  the  following  assumption  for  the 
convergence  of  the  proposed  new  algorithm. 

Assumption  (1)  [4] 

1 -Assume  f  is  bound  below  in  the  level  set  S  ={*£«"  :/(*)</ 00}  ;  In  some  Initial  point. 


2-  f  is  continuously  differentiable  and  its  gradient  is  Lipshitz  continuous,  there  exist  L  >  0  such  that: 

llg  (x )  -  g  (y  )||  <  ZJx  -y  |  Vx,ye  N  04) 


3-  f  is  uniformly  convex  function,  then  there  exists  a  constant  jU>  0  such  that 

(V/  (x)-'Vf  (y)f  (x  -y  )>/z||x  -yf  ,  for  any  x,y  e  5  (15) 

or  equivalently 

yXMWl  and  ^yTksk<L\\skf  (16) 

On  the  other  hand,  under  assumption  (1),  it  is  clear  that  there  exist  positive  constants  B  such 

||x||<fl  ,  Vx  £  S  (17) 

||v/(x)||<^  ,  Vx  G  S  (18) 

Lemma(l)  [4,5  and  10] 


Suppose  that  Assumption  (1)  and  equation  (17)  hold  true.  Consider  any  conjugate  gradient  method  in  from  (2)  and 
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(3),  where  d  k  is  a  descent  direction  and  ak  is  obtained  by  the  strong  Wolfe  line  search.  If 

Zf7iF='»  (19) 

Then,  we  have 

liminf  llg,  11  =  0  (20) 

k  — >°o  11°  k  II 

More  details  can  be  found  in  [3  and  6], 

Theorem  (2) 

Suppose  that  assumption  (1)  and  equation  (17)  and  the  descent  condition  hold.  Consider  a  conjugate  gradient 
method  in  the  form 


dk+i=  -&k8k+ 1  +  (1  -  0t  )pd  k 


where  CC,  is  computed  from  Wolfe  line  search  condition  (4)  and  (5),  If  the  objective  function  is 


uniformly  on  set  s,  then  liminf  g,  =0  . 


Proof 


Firstly,  we  need  substitutingour  j3k  +1  ,in  the  direction  d  k  +1  there  for  we  obtain: 

dk+l  =~0tgk+1  +  (\-Ok)pkdk 


After  simplify  above  equation  we  get 


K+ir=Hs...+(i-s.)AAir 

IK+|=H*..,+a-».)AAf 

\d  k  +1  k  +1 II  +A+llKII  +0kfik+  l\\d 


dk+\  l2^l  \s  A-+1  +a+^)A+iki 


Suppose  that/?  —  0 k  and  c=  (1  +  0k  )  fd k 

Mt+ll  -^II^A+ll  C  \d k  I 


dk"<by-+c  d. 
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Suppose  that T  —b{y  )2  +  J  C  \\d k 


<T 


1 


r 


1 


i  <1 


II  d 


*+i 


=  oo 


=  0 


(29) 


(30) 

(31) 


(32) 


6.  NUMERICAL  RESULTS  AND  COMPARISONS 

In  this  section,  we  report  some  preliminary  numerical  results.  We  compare  with  classical  conjugate  gradient 
direction  methods.  We  compare  the  performance  of  new  formal  d k+^  Proposed  new  Scaled  conjugate  gradient  algorithm 

for  Unconstrained  Optimization  to  classical  direction  conjugate  gradient  algorithm  by  using  0^,0^  ,00 ,00  .  We  have 
selected  (75)  large  scale  unconstrained  optimization  problem,  for  each  test  problems  taken  from  (Andrei,  2008)  [2].  For 
each  test  function  we  have  considered  numerical  experiments  with  the  number  of  variables n  =100  1000-  These 

new  versions  are  compared  with  well-known  classical  directionconjugate  gradient  algorithm.  All  these  algorithms  are 
implemented  with  standard  Wolfe  line  search  conditions  (4)  and  (5)  with.  In  all  these  cases,  the  stopping  criteria  is 

the  ||  =  10  6  .  All  codes  are  written  in  double  precision  Fortran  Language  with  F77  default  compiler  settings.  Thetest 

functions  usually  start  point  standard  initially  summary  numerical  results  recorded  in  the  figures  (1),(2),(3)  by  matlab.  The 
performance  profile  by  Dolan  and  More'  [5]  is  used  to  display  the  performance  of  the  Proposed  new  Scaled  direction 

conjugate  gradient  algorithm  withclassical  directionconjugate  gradient  algorithm  by  using 0R,0S  ,0^,0^  .  Define 
P  —  750  as  the  whole  set  of  fl  test  problems  and  S  —  8  the  set  of  the  interested  solvers.  Let  /  be  the  number 

±  p  p  ,S 

of  objective  function  evaluations  required  by  solver  S  for  problem  p  .  Define  the  performance  ratio  as: 


lps 

rP.s=f- 

lp 


(33) 


Where  /  =  mini  /  1 S  G  S  }  ■  It  is  obvious  that  T  r  ^  1  for  all  7?  , S  .  If  a  solver  fails  to  solve  a  problem, 

P  1  p  9S  1  P  I 


p,s 


the  ratio  K  is  assigned  to  be  a  large  number  M  .  The  performance  profile  for  each  solver  .S'  is  defined  as  the  following 

P 


cumulative  distribution  function  for  performance  ratio  V 


p,s  ’ 


pAt) 


size{p  e  P  :  rps  <r} 


(34) 
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Obviously,  p  s  (1)  represents  the  percentage  of  problems  for  which  solver  S  is  the  best.  See  [5]  for  more  details 

about  the  performance  profile.  The  performance  profile  can  also  be  used  to  analyze  the  number  of  iterations,  the  number  of 
gradient  evaluations  and  the  cpu  time.  Besides,  to  get  a  clear  observation,  we  give  the  horizontal  coordinate  a  log-scale  in 
the  following  figures. 

Notes 


1-  By  using  Wolfe  conditions  (4)  and  (5)  to  choose  CCk  [11] 

2-  Definition  of  algorithms 

FRN  =5> 

dk+i=-0ksk+1+(x-^)^yk 

FRD  => 

dk+l=-8k+i+fi™dk 

DYN  => 

dk+l=-ok8k+l+(i -ok)PZdk 

DYD  => 

dk+l=-gk+[+PZdk 

DXN  => 

dk+l=-ok8k+l+(i  -ek)fi«dk 

DXD  ^ 

dk+i=-8k+i  +  P™dk 

Performance  based  on  epoch 


Funcion  Gradient  Performance 


Figure  (2):  Performance  Based  on  Function 
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Performance  based  on  Time 


Figure  (3):  Performance  Based  on  Time 
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